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Abstract 


This study proposes a two-part model that includes components for reading accuracy 
and reading speed. The speed component is a log-normal factor model, for which speed data 
are measured by reading time for each sentence being assessed. The accuracy component 
is a binomial-count factor model, where the accuracy data are measured by the number of 
correctly read words in each sentence. Both underlying latent components are assumed to 
be Gaussian in nature. In this paper, the theoretical properties of the proposed model are 
developed and an Monte Carlo EM algorithm for model fitting is outlined. The predictive 
power of the model is illustrated in a real data application. 


1 Introduction 


As part of screening assessments of reading ability in schools, oral reading fluency (ORF) is 
frequently assessed to identify students at-risk for poor learning outcomes in order to help 
guide and inform instructional decision-making for such students (e.g., Fuchs, 2004). In 
traditional ORF test administration, a student is given one minute to read as many words 
as possible in a grade-level text of approximately 250 words. During the test administration, 
a trained assessor follows along and indicates on a scoring protocol each word the student 
reads incorrectly. After one minute, the total number of words correctly read is obtained. 
If the student finishes reading the entire passage within one minute, the time took to read 
the passage is also obtained. Then, they are transformed into a reporting score, namely, the 
number of words correctly read per minute (wcepm). 

Despite prevalent use and practical application of ORF measures, the current standard 
assessment of ORF has considerable psychometric limitations which potentially make ORF 
measures less reliable and valid. This paper is part of an effort to develop and establish 
a new ORF assessment system. The new ORF assessment system incorporates centralized 
scoring based on recorded reading both by human assessor and speech recognition engine. As 
a consequence, the assessment system collects accuracy and time data at the word, sentence 
and passage levels. Availabilities of word and sentence level data enables one to estimate 
reading speed beyond the traditional wcpm scores. Thus, this study proposes and evaluates 
a psychometric model to estimate reading speed by a latent variable model that incorporates 
speed and accuracy jointly. 
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through Grant R305A140203 to the University of Oregon. The opinions expressed are those of the authors and do not 
represent views of the Institute or the U.S. Department of Education. 


The proposed model is a modification of a speed-accuracy model proposed by van der 
Linden (2007). van der Linden’s model is a two-part model for speed and accuracy when 
taking a test. The speed component is a log-normal factor model, for which speed data 
are measured by time taken (such as second) to respond to each item being assessed. The 
accuracy component is a 3-parameter logistic (3-PL) item response theory model, for which 
accuracy data are measured by correct-incorrect item responses. In the present study, the 
speed part of the model also follows a log-normal factor model as formulated by van der 
Linden, and the speed data are measured by the time it took to read each sentence. On 
the other hand, this study utilizes a a two-component normal ogive binomial-count factor 
model, where the accuracy data are measured by the number of correctly read words in each 
sentence. 

The approach to model fitting here is explicitly frequentist in nature, while the model 
proposed by van der Linden (2007) was fit in a hierarchical Bayes framework. Hierarchical 
Bayes is a common approach with these type of latent variable models - see for example 
Fox et al. (2007), Entink et al. (2009) and van der Linden et al. (2010). In this paper, 
we propose an EM algorithm for fitting the model to avoid the computational complexity 
associated with direct maximization of the likelihood function. The EM algorithm was first 
proposed by Dempster, Laird & Rubin (1977) for performing maximum likelihood estimation 
in the presence of missing data; see also the recent monograph by McLachlan & Krishnan 
(2007) and the references therein. The modeling approach adopted in this paper makes 
use of a variation of the EM algorithm known as the Monte Carlo EM (MCEM) algorithm, 
details of which can be found in Wei & Tanner (1990). 

The outline of the paper is as follows. In Section 2, the proposed model is outlined 
and some of its theoretical properties are discussed. Section 3 discussed two approaches to 
parameter estimation, a method of moments approach and maximum likelihood estimation 
via the MCEM algorithm. Section 4 presents simulation results comparing the two proposed 
methods. Finally, an analysis of a real dataset is presented in Section 5. This analysis 
includes an investigation of the out-of-sample predictive ability of the proposed model. After 
concluding remarks in Section 6, the derivation of some technical results and the sampling 
algorithm used for implementation of the MCEM method are outlined in an appendix. 


2 Oral Reading Fluency Model 


So far, we have described the context of the model assuming that the unit of analysis was a 
sentence. However, the unit can be a word, a sentence or a paragraph. Therefore, the generic 
term item will be used hereafter to refer to the unit of analysis. Let N = (N,,..., Nz) denote 
the vector of total words per item where items are indexed i = 1,..., J. For the 7” individual, 
j=l,...,n, let Y; = (Y1;,-.-,¥1;) denote the response vector containing number of words 
read correctly for each of the J items, and let T; = (Z1;,...,77;) denote the response vector 
of reading times for each of the J items. Furthermore, let €; = (6;,7;) denote the pair of 
latent variables with 6; the reading accuracy factor and 7; the reading speed factor. In the 
model, it is assumed that, conditional on ability, words correct per item follows a Binomial 
distribution, 

Yij|0; a PGND: (9;)) (1) 
where f denotes the binomial mass function with item size N; and success probability 
pi (9;) = ® (a; (0; — 0;)). Here a; € R* and 6; € R are interpreted as the discrimination 
and difficulty of the i” item’s differentiation of reading accuracy, respectively, and © (-) de- 
notes the standard normal cumulative distribution function. Conditional on latent speed, a 
log-normal model is specified for response time, so that 


logTij|T ~ aid (a (tiy — Bi + 75) (2) 


where ¢;; denotes the natural logarithm of the time it takes individual 7 to read item 7. 
Here a; € R* and §; € R are interpreted as the discrimination and intensity parameters 
of the i” item’s differentiation of reading speed. Here, ¢(-) denotes the standard normal 
density function. Combining (I) and (2), the joint distribution of response vector (Y,, logT;) 
conditional on €; is given by 


Ff (yj, t3|€3) = na Yaz; Mi, Pi (O;)) cad (04 (tay — Bi + 75)) 


where y; = (yij;,---,Yz;) and t; = (t;,...,t7;) denotes the observed counts and logarithmic 
times for the 7” individual. Note that for a given item, the words correct count and reading 
time are conditionally independent given latent vector €;. It is assumed that this latent vector 
€; follows a bivariate normal distribution with mean vector fg = (Jug, 4) and covariance 
matrix 
5, = 05 Cor 
Oo, OO. 


For model identifiability, it is necessary to i impose constraints on these parameters and there- 
fore it is assumed that jug = 4, = 0 and that of = 1, while og, and o? are free parameters. No 
distributional assumptions are made for the aS Beane parameters { (a4, bi, 4, Bi) Jian 
This could certainly be incorporated into the model and would result in a nonlinear random 
effects model. However, the model developed here does not assume items are randomly 
drawn from some population of items. Rather, the model is developed conditional on the 
collection of J items used in assessing reading accuracy and speed. 
The unconditional distribution of response vector (Y,;,logT;) is given by 


f (y;, t;) =f We (yas; Ni, Di (8 )) aP (a, (¢ ij =G,--7)) 2 (€; we, Ue) d& (3) 


where the integral is taken over the real plane € = (0,7) € R? and where @y (-; 4, ©) denotes 
the bivariate normal density with mean y and covariance &. 

It is not uncommon for datasets to contain missing values, as an individual may not read 
all items in the alloted time. Recording errors can also result in missing values. It is assumed 
that, for the j individual and i” item, either both or neither the count y;; and log-time t;; 
are observed. When missing, the pair of measurements is assumed to be missing completely 
at random. Let S; C {1,2,..., 1} denote the set of items for which count and time variables 
were observed for the 7 individual. The joint distribution which takes missing values into 
account can be written as 


f (y;, ty |S;) =f I] - (wigs Nag ree )) aid (a; (t ij =e) 2 (€; we, Ue) d€. (4) 


t€S; 


The integral form of this joint distribution makes direct maximization of the likelihood 
function an untenable proposition. This, in part, speaks to the popularity of Bayesian 
methods in estimating complex latent factor models that are nonlinear in nature. One of 
the contributions the present paper makes to the literature is the development of an EM 
algorithm for maximizing a likelihood function based on (3). The EM algorithm, as well as 
a method of moments approach to parameter estimation, is discussed in the next section. 


3 Parameter Estimation 


3.1 Method of Moments 


In this subsection, a method of moments (MOM) method is proposed for estimating the 
parameter vectors { (i, bi, 4, Bi) Fina as well as the parameters a? and o9,. The MOM 
estimator, while interesting in their own right, also provide useful starting values for the 
EM algorithm which is outlined in the next subsection. The moments of log7;,; follow from 
properties of the normal distribution and derivation is omitted. The moments of the count 
variables Y;; and the covariance between Y;; and log7;,; are derived in Appendix 7.1. 

For the i” item, the word count Y;; has mean 


E(Y;;) = N® (-ts | (5) 


and variance 


az 
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where ®(-,-|@) denotes the bivariate normal cumulative distribution function with zero 
means, unit variances and correlation coefficient p. The model can accommodate both over- 
and under-dispersed count data, as the unconditional variance of Y;; can be either larger or 
smaller than the mean for appropriate choices of parameter values a; and 0;. 

Next, consider the logarithm of reading time per item, log7;;. Using standard properties 
of the normal distribution, it follows that 


a,d; a,d; 


—Vfi+e Vite 
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Var (Yi;) = N? >. ( 


+N; 


1 
Var (logTy) = 02 + (8) 
and 
Cov (logT;;, logTy;) = 0? (9) 


where 7 # i’. Finally, the covariance between the word count and the logarithm of reading 
time is 


nj a? me 1 aeb 
Cov (hon logT;;) = a = (a4) exp (-32 i :) : (10) 


Method of moments estimators are found by replacing the population moments in equa- 
tions (5), (), @, (@&), @) and by their sample equivalents and then solving for the 
unknown parameters. This can sometimes be done in multiple ways, specifically when there 
are more moment equations than unknown parameters. Therefore, the estimators presented 
below are only one possible way of finding MOM estimators. 

For the i“” item, let y; and a denote the sample mean and variance of observed counts y;;, 


calculated over the set of individuals with non-missing responses for the i” item, {j : 7 € S;}. 


Similarly, let t; and st denote the sample mean and variance of the i*” item’s log-times t,;. 
Now, let ; be the correlation coefficient that solves estimating equation 


= = 2 — =< 
+ uly = 1 
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N; (Ni — 1) 
Subsequently, estimators of a; and b; are given by 


@ “aria 
a= ( a) (12) 


(11) 


and ih 
: (1 + 4?) -1( ¥ 
—— a ee 13 
ay Ni (13) 
fori =1,...,7. Now, let s;,4,, denote the sample covariance between the log-times of items 


i and i’, covariance calculated over the set {j : i,i’ € S;}, ie., individuals for which both 
items 7 and 7’ are observed. An estimator of o? is given by 


I I 
2 
a2 
=r ee 14 
0, I (J a 1) d. 2 St, ty ( ) 


Subsequently, define 


(15) 
(en nt — Te 
max 0, (s?, — 6?) =| 
and a 
B= t (16) 
fori = 1,...,/. The estimator in contains a finite-sample correction to ensure that 


a; > 0 for all 7. Note that this moment-estimator of a; can be infinite, but this doesn’t 
pose a problem as Var(Y;;) in depends on the inverse of a;. Finally, let s,, 4, denote the 
covariance between word count and the log-time for the i‘” item and define 


I A —1/2 AQT2 

: (27)? Siok ae 1 ab; 
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Equations (12) through are the proposed MOM estimators of the model parameters. 
These MOM estimators have an advantage over the maximum likelihood estimators in that 
they are fast and easy to calculate. However, the MOM estimators are typically less effi- 
cient than the maximum likelihood estimators when compared using root mean square error 
(RMSE) as a criterion. This will be illustrated in the simulation study section of this paper. 


3.2 Monte Carlo EM Algorithm 


The EM algorithm, originally proposed by Dempster et al. (1977), is a method of performing 
maximum likelihood estimation in the presence of missing and/or latent variables. The ORF 
model being considered in this paper can be placed squarely in the original mold for which 
the method was developed by treating the latent vectors €; as missing. The EM algorithm 


takes the log-likelihood function of the full data — observed variables (Y;;, Tj) and unobserved 
variables €; — and then iterates between calculating the expected value of “the log-likelihood 
function conditional on the observed random variables (E-step) and maximizing the function 
obtained in said step in terms of the model parameters (M-step). This iterative process is 
repeated until convergence of the model parameters is achieved. In this subsection, the two 
steps of the EM algorithm are formalized in the context of the ORF model. Furthermore, a 
Monte Carlo approach for the E-step is proposed, as closed form expressions are not available 
for the conditional expectations that need to be calculated. 
The complete data likelihood function is 


Le = [Fe nas:( (yj, tj, Oj, T)|S;) 
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which can also be written as 


-T II ()s [a; (0; — bi)]"* {1 — ® [a; (0; — b,))} 
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Let = denote the collection of all model parameters and let Ey denote some initial values 
for these parameters (possibly the MOM estimators from the previous subsection). Let Ei, 
denote the parameter estimates obtained after the k” iteration of the EM algorithm. Define 
the conditional expectation function 


Q (=, a a Es, , [lege | (y;,t;, S;) iJ a Las 2251, 


where the conditional expectation is evaluated treating =,_; as the true parameter values. 
Here, up to a constant of proportionality that does not depend on the parameter values, 


logl 5 S| yijlog® | a; (6; — b;)| + 3 So (N, — yiz) log {1 —  [a; (8; — ;)]} 
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and thus, 


Q (=, =,1) = 3 > vw Es,, Hlog® [ai (0; — bi)l|ys, ts, 85| 
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At each step of the EM algorithm, the conditional expectation terms in (I8) are evaluated 


treating &,_, as the true parameter values and thereafter new maximizer &; is found. This 
iterative process is repeated until convergence of the algorithm is achieved. 

For the ORF model under consideration, there are no closed form solutions for the condi- 
tional expectations in (18). However, it is possible to sample from the conditional distribution 


6; 
to use the MCEM algorithm as described in Wei & Tanner (1990). Let ew”) = (a, i), 
m=1,...,M, denote M;, independent draws from lésly Tey; S.), the distribution of 


the latent vector €; conditional on the observed (non-missing) values (y;,t;) and assuming 
true parameter values &;,_ ;. Define the Monte Carlo approximation to (18), 


Vis byiOy, Ey). This sampling algorithm is outlined in the appendix. This allows one 


Mn, n 
Qr-1 (2) = wi DD wstoe [ (ay ~ bi) 
m=1 j a 
ey i — Yi )log {1 - ® fa; (as — 2)| } 
m=1 j=1 ieS; 
Mn on 
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At each iteration of the MCEM algorithm, this function is maximized and thereafter 
the updated parameter estimates are used to generate a new sample to update the function 
Q. Due to the stochastic nature of the MCEM algorithm, Wei & Tanner (1990) propose 
selecting a small value M;, for the first several iterations. After these iterations, the updated 
solution is typically in the part of the parameter space ”close to” the maximum likelihood 
solution. Thereafter, a large value of M;, ensures that the maximum of Q is close to the 
maximum of Q. 


03.) 


4 Simulation Study 


Several simulation studies were performed to compare the method of moments and Monte 
Carlo EM-based maximum likelihood estimators. The simulation study is motivated by 
the knowledge that the MOM estimators can be computed very quickly, while the MCEM 
estimators are time-consuming to compute. A representative example of one such simulation 
is presented here. This paper does not attempt to make a strict recommendation of one type 
of estimator over the other, but does illustrate the difference between the two methods in 
terms of RMSE. 

For the i’” individual, a pair of latent traits (0;, 7;) was simulated from a bivariate normal 
distribution with fixed parameters jug = 4, = 0 and of = 1. For the simulation presented 
here, 0? = 0.24155? and og, = —0.18116. Note that the variance parameter o? is a scale- 
dependent parameter. That is, the scale of measurement for time, in this simulation taken 
to be minutes, affects the size of the parameter. The parameter o9,, was chosen to give 
a correlation of p = —0.75 between the two latent traits. These latent trait vectors were 
() pp) 


then used to simulate data under two different item configurations. Data es rol ) was 


generated under a scenario with J = 2 items each consisting of N; = 50 words, while data 


ya?) was generated under a scenario with J = 4 items each consisting of N; = 
25 words. These parameter specifications all correspond approximately to an average oral 
reading speed of 122.5 WPM and success rate of 98 WCPM. Data were generated with 


sample sizes n € {40, 100, 250}. 


In Table [1] the means and variances of the words correct counts A and reading times 


(not log-scale) r? are summarized for the two simulation scenarios = 1,2. Model param- 


eters can easily be recovered from these using the moment equations through and are 
therefore not presented separately. Note that the parameters were chosen so that the total 


number of words read correctly, >°, es and total reading time, )/, jpeg have, conditional 
on the latent traits, the same means and variances for both scenarios ¢ = 1, 2. 

A total of 500 samples were simulated in this way. For each sample, the MOM and 
MCEM estimators were calculated. The MCEM algorithm consisted of K = 13 iterations 
where the first 10 iterations used M = 20 and the last three iterations used M = 200 Monte 
Carlo imputations. As the parameters (a;,};,a;,0;), i = 1,...,2 were the same across all 
I items in the simulation specifications, the average standard error (ASE) and average root 
mean square error (ARMSE), found by calculating the average of the item-specific standard 
errors and root mean square errors, are reported in Tables 2] and [3B] The quantities are also 


scaled by n'/?, the asymptotic convergence rate of the parameters. 


Inspection of Tables 2] and] reveal several interesting points. Firstly, note that ARMSE 
is only slightly larger than ASE. On average, for the MOM estimates, ARMSE represents 
approximately a 1% increase over ASE, while for the ML estimates, ARMSE represents 
about a 0.7% increase over ASE. This is strongly indicative that both the MOM and ML 
parameter estimates are nearly unbiased. Furthermore, with the exception of the precision 
parameters a;, ASE and ARMSE are relatively stable with respect to sample size. However, 
when the sample size is small, there is great uncertainty in estimating a;, as is evident from 
the large ASE and ARMSE values at n = 40. Here, the ML estimators of the precision 
parameters have substantially lower variability than the MOM estimators. For a;, there is 
between a 10% and 50% reduction in ARMSE when comparing MOM and MLE estimators. 
The effect is not as pronounced when considering the other parameters, with the ARMSE of 
both a; and 6; decreasing, but that of (; slightly increasing when comparing the MOM and 
ML estimators. 

Inspecting the performance of the two methods of estimation is interesting, but a more 
relevant question is the ability of the model to recover the latent traits of individuals. This 
was also investigated in the simulation study. For each simulated dataset, both the MOM 
and ML parameter estimates were used to estimate individual traits 0; = E |0;/Y;,T| and 
7; = E|[7,/Y;,T,;| by taking M@ = 20 Monte Carlo draws from the conditional distribution 
and averaging these. In each instance, the correlation coefficients between the imputed latent 
variables and the known true values in the simulated data was computed. The average 
correlations across the 500 simulated datasets are reported in Tables [4] and 


The results in Tables[4Jand[Jare very encouraging. Even at a small sample size of n = 40, 
the latent traits are recovered well as indicated by large average correlations. The model 
using J = 4 items with N = 25 words per item does a much better job of recovering latent 
traits than the model with only J = 2 items with N = 50 words each. This is interesting, 
since the means and variances of the two settings are the same conditional on the latent 
traits. Also note that for the parameter specifications used, reading accuracy ability 6 is 
recovered with greater precision than reading speed ability 7. Overall, increasing the sample 
size results in a larger average correlation, and maximum likelihood performs better at the 
same sample size than method of moments does. 


5 Data Analysis 


The data analyzed in this section were part of the data collected from two public school 
districts in the Pacific Northwest region in the United States. The data consists n = 53 
fourth graders, and measurements (Y;, logT;) were obtained for J = 18 items at the sentence 
level. Sentences vary in length, being as short as just four words and as long as nineteen 
words. The average proportion of words read correctly for these items is in the range 0.828 to 
0.957. For each item, there are a few missing values. This indicates that a specific student did 
not read a paragraph which contained said items. The item-specific missingness rate ranges 
from 0.094 to 0.151. For a given item, a few missing values are not a problem, but there are 
only 35 complete data cases (approximately 66% of the sample). With an assumption that 
values are missing completely at random, the MCEM algorithm adjusting for missing values 
will be used. 
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Table [6] provides a summary of the data, listing the number of words in each item, the 
observed sample means and standard deviations calculated by ignoring missing values, as well 
as the means and standard deviations implied by the model after implementing the MCEM 
algorithm. The model-implied moments are a one-to-one transformation of the estimated 
model parameters along with variance estimate 6? = 0.0469. The estimated correlation 
between the two latent variables is 6 = —0.037. The tabulated model-implied moments 
conform closely to the sample moments shown and and are suggestive that the model has a 
large amount of agreement with the data. There is presently no formal goodness-of-fit test 
for this setting and is a future avenue of research. 

Also presented in this section is an assessment of the predictive power of this model. To 


mimic the idea of out-of-sample predictive ability, the analysis was done as follows. Let = 
denote the maximum likelihood estimators of the model parameters. Let y(_,; and t(_ 


denote the count and log-time vectors for the j“ individual with the i” item removed. 
Also let Si); = S;\ {i}. Let (01 tse m =1,...,M denote M pairs sampled from 


8) Chea taney S|. The estimated latent scores 


M 


(-1)j7 = pe d eee 


and 


are therefore estimates of the latent traits obtained without using information from the i*” 
item. Noting that E [Y;;| = n;® (—aibi/ V1 + a) and E [Y;,;|0;] = ni® [a; (8; — b;)], rea- 
sonable predictors of Y;; are sa = n,;® (- ab; //1+ a 4 and aS =n; la, (4, ag = i;) | 
where An is the best predictor without any information on the latent factor, and ag 
is the predictor using the latent factor estimated without using information from the 
i” item. Similarly, noting that E[T,;] = exp[8;+0?/2+1/(2a?)| and E[T;,|7;] = 
exp [8; — 7; + 1/ (2a?)], predictors ees ) = exp E + 62/2+1/ (242)| and 


A 


iD = exp (4 — T=<a\4 + ly (24? )] are defined. 
97 1/2 
Define the root square prediction error, RSPEY? = isi" Dg, (Yi; - A ') | and 


— 9TH 
RSPES) = sir a (Ty — 7) | for k = 0,1. These values are reported in Table 
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In general, the RSPE values in Table[fJindicate that the latent factors have strong predic- 
tive power. When considering the relative decrease in RSPE, (RS PE — RS PE) /RSPE®), 
these values range between 0.097 and 0.454 for the count data, with an average reduction in 
RSPE of about 24%. For the time data, the values range between 0.122 and 0.507, with an 
average reduction in RSPE of about 30%. 


6 Conclusion 


A latent joint model to measure oral reading speed and accuracy was proposed in this 
paper and a Monte Carlo EM algorithm to estimate the model parameters was derived. 
Overall, it was demonstrated that model parameters were estimated well. Simulation studies 
demonstrated reasonable recovery of model parameters. Also, data analysis demonstrated 
the latent factors had excellent predictive power. However, the quality of estimated speed 
factor scores as reporting oral reading speed is still unknown. Therefore, future research to 
investigate the characteristics of estimated speed factor scores is warranted. 
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13 
7 Appendix 


7.1 Derivation of Some Model Moments 


The derivation of the moments for the ORF model outlined in this paper relies on two 
important results concerning the normal distribution. The first of these two results is given 
here before the derivation the moments of Y;;, while the second result is presented before 
the derivation of the covariance between Y;; and logT;; is shown. 

Result 1: Let Z be standard normal random variable and let a > 0 and b be real num- 
bers. It is then true that E[®(a(Z— b))| = © [—ab/(1+a7)'/?| and E[®? (a(Z —6))] = 
®, [—ab/(1 +.a?)'/?, —ab/(1 +. a?)'/?|p = a?/(1+.a”)] where ®(-) denotes the standard nor- 
mal cdf and ®(-,-|p) denotes the bivariate normal cdf with zero means, unit variances and 
correlation p. 

Proof: Let 2, Z and Z3 be id standard normal random variables. Then, 


E[®(a(Z,—))] = P(Z,<a(Z,—6)) 
= P((1+a?)¥/?Z, < —ab) 
= [-ad/(1+0’)'?] 


where the second line follows from noting that a linear combination of independent normal 
variables is again normal. Also, 
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= fP (Z5 = aZy < —ab, 23 = aZy < —ab) 
= 0, [-ab/(1 + a2)", —ab/(1 + @2)!\p = a2 /(1 + 02)] 


where the last equality follows upon noting that the pair (Z2 — aZ,, Z3 — aZ,) is bivariate 
normal with correlation p = a?/(1+ a’). 


For random variable Y;;, we have by application of the tower property of expectation, 
E|¥;j] = E[E(¥;;|9;)] = NiE [®(ai(9; — &4)] 
and similarly, 


E [YZ] = B [E(¥i|@;)| = Ni(Ni — DE [©7(a:(6; — b,)| + NE [®(a,(0; — b;)] 
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The result required for calculation of the covariance between Yj; and logT;; is now pre- 
sented. 

Result 2: Let Z be standard normal random variable and let a > 0 and 6 be real numbers. 
It is then true that 


E[Z® (a(Z —b))| =E[d(b+a7'Z)] = (20)? (=) 7 exp (355) 


where $(-) denotes the standard normal pdf. 

Proof: The first equality relies on the result E[Z/(Z > x)| = $(x) as well as application 
of the tower property. The second equality follows from some tedious but straightforward 
algebra. 

In evaluating the covariance, consider 


EY; logtss|!. = BE /Y.7|0,| 2 Postylrl 
= NE[9(a;(9; an — 7;)] 
= NPE [®(a:(9; — b;))] — NE [77 ®(ai(9; — b:))| 
= N,BE[®(a;(0; — b))] — iE [(0769; a (0? — O%, )?Z5)B(ai(9; = b:))] 
= N,P,E[®(a;(0; — b:))| — a [0;(a;(0; — b;))] 


where the second-to-last equality makes use of the fact that 6; has the same distribution as 


0,60; + (02 — o2,)'/?Z; where Z; is a standard normal random variable and is independent 
of 6;. It then follows from application of Result 2 that the covariance is given by 


Cov (Yij,loglij) = E[YijlogTi;] — F [Yij] E [logTi,] 
—Nio79E [8;® (a; (8; — 6:))] 


Ne fae 1 a2b? 
— | —=—-— xp | —= : 
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7.2 Sampling Algorithm for MCEM Implementation 


—070 


For the j“ individual, the joint distribution for the vectors of non-missing counts and log- 
times, Y; and T; conditional on the latent variables 6; and 7; is 


1M ig n—Yij 
Fy, log; |a,,r; (yj, t 519), Tj) = II 6 ) ® [ai (9; = b;)]’ {1 =P la; (0; = bs) |} 7 
a 


while the latent traits have distribution 


1 
fo; ry (855 3) = ——— sas sv { - Sa an 
ne Oe (02 — 2)? 2 (a7 — 03,) 


The unconditional distribution of [Y;, logT;| involves a double integral and is therefore im- 
practical for direct use, it is possible to integrate out the latent component 7;, 


(o 205 — 20670;7; + ays . 


fy, logr,.o, (y;; t;, 8;) a ee (Yj, t 5/95, T;) Cee (9;, i) dt; 
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and as [6;|Y;,Tj,S,] is proportional to [Y,, T;, ;,S;], one can show that 


forv,logr, (Vir t,9;) I(" ‘Je [as (8; — Bs} {1 — ® [ai (8; — bi) 


iS; 
P 1[ ba | 4 ( a j (tig — B; 
? v a] 
where A, = Nes, a;. One can therefore use rejection sampling scheme to draw samples 


from [4;/Y;, T;,S; a Let f*(0;) denote the right-hand side of and define 


1 Lato A, 00 
*(9.) — ye ul oS t; 
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Note that g*(@;) is the kernel of a Gaussian distribution with mean 
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and variance 


=i 
2 ae) . 29 
9 (+ aa ee) 
Noting that f* (@) < g* (@) for all 6 and defining 


where 


f*() = Qa; ny a = Yij = a; a Ni Vij 
EON = TT au(™ ) ela (09) = Bf. (0) — a) 


icS; 


Therefore, one can implement rejection sampling in the following way: Generate a normal 
random number R with mean pi, as in (21) and variance oF as in (22). Also generate U 


uniform(0,1). If 


set 6; = R, otherwise repeat. Here, #; represents a random draw from the distribution of 
[A|Y , logT). 
Next, it is easy to verify that the distribution of [7|0, Y, logT] is normal with mean 


)i 


(o? =a7_)'/? 
— et al tty — Bit on) (23) 


Lr\a,Y,T = = 
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and variance - 
Oneyr = (1+A(or-9%,)) . (24) 


Therefore, after using rejection sampling to sample 6, the corresponding value 7 can be 


sampled by plugging 6 into equations and (24) and generating a normal random variable 
with that mean and variance. 


obrg 


Tables and Figures 


| 50 word items 25 word items 


(counts Fzme) | (40, 6.27492) (20, 4.4371?) 
(Limes C2me) | (0.4086, 0.12052) | (0.2043, 0.06022) 


Table 1: Means and variances of words correct count and reading time for items under two 


different simulation scenarios. 


n/2ASE | n/2ARMSE | n¥/2ASE | n1/2ARMSE 


0.670 0.704 0.625 0.628 
2.123 2.164 1.983 2.011 
n = 40 } 
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0.275 0.276 0.278 0.279 
0.092 0.093 0.092 0.093 
0.232 0.233 0.218 0.235 
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Table 2: Average standard error (ASE) and average root mean square error (ARMSE) of 


MOM and EM estimators, N; = 25 and I = 4. 


n/2ASE | n/2ARMSE | n¥/2ASE | 1/2 ARMSE 


0.432 0.434 0.406 0.414 
2.709 2.822 2.611 2.716 
n = 40 : 
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0.294 0.294 0.295 0.295 
0.103 0.103 0.101 0.101 
0.262 0.263 0.242 0.311 
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Table 3: Average standard error (ASE) and average root mean square error (ARMSE) of 


MOM and EM estimators, N; = 50 and I = 2. 
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MOM ML 


Cor 0,0) | 0.9653 | 0.9657 | 0.9667 | 0.9662 | 0.9661 | 0.9668 
Cor (1,7) 0.9350 | 0.9465 | 0.9505 | 0.9468 | 0.9504 | 0.9514 


Table 4: Average empirical correlation between true latent scores and imputed scores with 


M = 20 Monte Carlo draws for model with N; = 25 and I = 4. 


MOM ML 


Cor 0,0) | 0.9396 | 0.9437 | 0.9429 | 0.9413 | 0.9438 | 0.9427 
Cor (1,7) 0.8954 | 0.9087 | 0.9121 | 0.9033 | 0.9116 | 0.9134 


Table 5: Average empirical correlation between true latent scores and imputed scores with 


M = 20 Monte Carlo draws for model with N; = 50 and J = 2. 
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Count Data (Y) Time Data (T) 


Item | N; | Sample Moments | Model Moments | Sample Moments | Model Moments 
(Yi; Sy;) (fiy,, oy;) (ti, St;) (fir, O7;) 


(17.673,2.915) | (7.002, 1.649) (6.993, 1.726 
(139,181) | (7122,1.769 
3 [8 | _c7383,1226) | (ror, 1.268) | (3.281,0.733) | (2309,0.753 
(8.091,0.786 


WS ras wa’ 


(3.110, 0.887 


) ) 
aes (7.444, 1.486) (7.407, 1.447) (2.829, 0.938) (2.812, 0.967 
) ) 


L 6 | 5) (4.644, 0.981) (4.627, 0.939 (2.141, 0.594 (2.128, 0.658 
(9.978, 2.251) (9.991, 2.166) (4.403, 1.786) (4.311, 1.473 
| gf 10, (8.533, 2.007) (8.826, 1.924) (5, 322, 1.895) (5.211, 1.797 


SS rae Jw J rwer—swe 


> fis] axase,2078) | (a.ss5,2.004 | (2201391) | (635,727 
Tao | 9 | 133,202) | (180.1756 | (2600.14) | (2685,1.968 
(5.753, 1.572) (5.768, 1.510 
| 12 | 8 | (7.178,2.026) | (7.061,2.075)_| (3.235, 1.080) (3.226, 0.967 
(6.260, 2.306) (6.256, 2.016 
Pu [a | inno) | 000m) | earaome | ear omar 
Fis [9 | eatin | urine | @7i9.1%%) | 7a) Los 


Table 6: Data summary: observed item moments (MOM) and model-implied item moments 


from EM algorithm (ML). 
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Count Data (Y) Time Data (T) 


Table 7: Leave-item-out root square prediction error for count and time data. RSPE is 


model-free, while RSPE™) is model-specific. 


